Unravelling Unstructured Programs 





G. Oulsnam* 


Department of Computer Science, University of Queensland, St. Lucia, Queensland 4067, Australia 





A method is presented for converting unstructured program schemas to strictly equivalent structured form. The 
predicates of the original schema are left intact with structuring being achieved by the duplication of the original 
decision nodes without the introduction of compound predicate expressions, or, where possible, by function duplication 
alone. It is shown that structured schemas must have at least as many decision nodes as the original unstructured 
schema, and must have more when the original schema contains branches out of alternation constructs. The structuring 
method allows the complete avoidance of function duplication, but only at the expense of decision node duplication. It 
is shown that structured schemas always require an increase in space-time requirements, and it is suggested that this 
increase can be used as a complexity measure for the original schema. 


1. INTRODUCTION 


This paper presents a method for transforming unstruc- 
tured program flowgraphs into structured equivalents in 
D-chart format.’ The form of the derived structured 
programs is such that the original unstructured programs 
can be easily recovered, thus revealing what overheads 
in space and time are inherent in the structured forms. 
The method enables the user to opt for minimization of 
time overheads, minimization of space overheads, or 
some intermediate compromise. A measure for the 
introduced overheads is given which can be used to 
compare the relative conceptual complexities of unstruc- 
tured programs. A feature of the structuring method is 
that the number of introduced auxiliary Boolean varia- 
bles, or flags, is kept to a minimum, and where such flags 
are introduced, they correspond exactly to some condi- 
tional expression, or predicate, in the original program. 
Thus the method preserves as far as possible the logic of 
the original program. 

The problem of transforming flowgraphs into some 
standard form has been widely addressed in the literature. 
Methods based on yielding a flowgraph in while-program 
form have been given by Jacopini,? Ashcroft and 
Manna,” Knuth and Floyd,* Bruno and Steiglitz,! Mills,> 
Kasai,° Williams,’ Williams and Ossher® and Oulsnam.? 
Jacopini’s method was shown by Cooper’? to yield in a 
trivial way a flowgraph consisting of a single while 
statement enclosing a sequence of alternations based on 
introduced auxiliary variables. Jacopini’s conjecture that 
in general auxiliary variables would be necessary to 
transform arbitrary flowgraphs into D-chart form was 
proved by Ashcroft and Manna,’ Knuth and Floyd* and 
Bruno and Steiglitz.' Kasai® and Bainbridge!' describe 
methods of reducing while-programs to minimal form, 
while the general capabilities and limitations of D-charts 
as a standard form were considered by Paterson, Kasami 
and Tokura’? and Kosaraju.'3 

The necessity for auxiliary variables, coupled with the 
fact that flowgraphs in while-program form were shown 
by Paterson et al.'? to generally require some duplication 
of basic functions or predicates of the original flowgraph, 


*Present address: Department of Computer Science, University 
College, Cork, Eire. 


has led to consideration of more general standard forms 
than D-charts. Wulf!* proposed the use of multi-level 
control structures and further generalizations were 
analysed by Kosaraju’* and. Ledgard and Marcotty,!5 
with some refinements of their results by Cherniavsky et 
al.'° Proposals based on the non-duplication of the 
original flowgraph’s functions and predicates have been 
given by Urschler’’ (using a technique based on back- 
dominators of the original flowgraph), and Baker.'® Of 
necessity both methods allow the use of GOTO statements 
although Ref. 17 restricts these to backward jumps only. 
A standard form based on binary trees has been proposed 
by Engeler!? and Wegner.?° The former allows only 
jumps to ancestor nodes in the tree, while the latter allows 
jumps in both directions. Proposals to convert flowgraphs 
to recursive form have been made by Knuth and Floyd* 
and Urschler.!? McCabe”! and Williams’ independently 
identified the basic forms of unstructuredness, and 
transformations based on the identification and elmina- 
tion of these constructs have been given by Williams,’ 
Williams and Ossher® and Oulsnam.? Dijkstra,” echoed 
by Knuth,”? has cautioned against expecting mechanical 
transformation of flowgraphs to yield more comprehen- 
sible programs, while Knuth?’ has examined the problem 
of efficiency relating to programs translated to standard— 
or structured—form. Van Emden’* has dismissed the 
need for structured programming altogether and proposes 
a method for deriving programs directly with minimal 
function and predicate duplication. 

The remainder of this paper is organized as follows. 
Section 2 introduces some necessary definitions and 
concepts, Section 3 briefly reviews the basic forms of 
unstructuredness in flowgraphs while in Section 4 a 
method for their removal based on structured transforms 
is given. The proof of the effectiveness of the structuring 
algorithm is given in Section 5. Section 6 contains an 
example of the use of the method and the paper concludes 
in Section 7 with a discussion on the space-time efficiency 
of the structuring transforms. 





2. SCHEMAS 


The method of structuring to be introduced in Section 4 
involves transformations on program flowgraphs or 


CCC-0010-4620/82/0025-0379 $04.50 


© Heyden & Son Ltd, 1982 


THE COMPUTER JOURNAL, VOL. 25, NO. 3,1982 379 


Lzoz Menuer g, uo sanb Áq Y9869€/62£/£/9Z/9P1e/ufwoo/wodno'owepeoe;/:sdyy wo, papeojumoq 


G. OULSNAM 


schemas.”> This Section briefly reviews schemas and an 
associated algebra for describing them. 

A schema shows the control structure of the program 
whilst leaving the details of the program’s computation 
to be defined as an interpretation of the schema. A 
schema therefore represents a family of distinct programs 
sharing a common control structure. Each program of a 
schema is considered to operate on three types of 
variables: 


input variables xl,..., xa 
local variables yl,..., yb 
output variables zl,..., zc 


which are represented collectively by X, Y and Z 
respectively. The operations of the program on these 
variables are of two types: 


SUX, Y),...,m(X, Y) 
P\(X, Y),..., pm(X, Y) 


Functions map their arguments into either Y or Z, while 
predicates map theirs into {true, false}. The composition 
of functions such as fi(X, H(X, Y)) is denoted by f(X, 
Y).fi(X, Y), where the full point (.) denotes the 
sequencing operator. The logical negation of a predicate 
pi(X, Y) is denoted by pi(X, Y). For both functions and 
predicates the argument list usually will be elided. 

The specification of the variables, functions and 
predicates for a particular program is called an inter- 
pretation of the schema. The transformational structuring 
process described in this paper is independent of such 
interpretations. 

Schemas are constructed by composition of the 
following statements. 


functions 
predicates 


START: 


| 


which is to be understood as an abbreviation for. 


S: Y= f(X), where S is a program (node) label. 


ASSIGN: 


I 


which denotes i: Y= fj(X, Y) 


pj 


TEST: 


Bj 


which denotes i: IF pj(X, Y) GOTO the left branch 
target node ELSE GOTO the right branch target 
node. 


HALT: C)}——) 


which denotes i: Z = f(X, Y) 


380 THE COMPUTER JOURNAL, VOL. 25, NO. 3, 1982 


Every schema consists of exactly one START and one 
HALT statement, and any number of uniquely labelled 
ASSIGN and TEST statements such that every statement 
lies on some path from START to HALT. The node of a 
TEST statement is called a decision node, and that of an 
ASSIGN statement a collecting node. 

In addition to the geometrical representation of 
schemas it is advantageous to have an algebraic repre- 
sentation as an aid to the transformational process. 
Following Kleene?® it is known that the computations 
associated with a flowgraph schema can be represented 
by a regular set. The regular expressions of the set are 
derived by regarding the schema as a finite state generator 
(fsg) whose states correspond to the nodes of the 
flowgraph and whose transitions correspond to traversal 
of flowgraph edges. Each transition causes the function 
or predicate identifier of the corresponding edge to be 
appended to the fsg’s output string. Any string output by 
the fsg in going from the START node to the HALT 
node represents a possible computation sequence of the 
schema, and the set of all such strings represents the 
schema’s computation sequence set. The regular set 
operators of union (+), concatenation (.) and Kleene 
star closure (*) are related to schema operations and 
statements as follows: 


(a) + (b) 

b 
(a). (b) O——0O—— 
(a)*. (b) : 


Here, ‘a’ and ‘b’ denote strings of function and predicate 
identifiers and the star closure postfix operator means 
‘concatenated zero or more times’. The string grouping 
operators (,) will be elided where their omission does not 
cause ambiguity. 

A complete forward path in a schema is any path that 
begins at the START node and ends at the HALT node 
without going through the same node twice. An edge of 
the flowgraph on some complete forward path is called a 
forward edge, and an edge which is not a forward edge is 
called a backward edge. Any path which does not include 
a backward (forward) edge is called a forward (backward) 
path. 

The end set E(i, /) of a node ‘i’ with respect to a not 
necessarily distinct node ‘j’ is defined?’ as the set of 
strings that the fsg would output in traversing all paths 
from ‘i? to ‘j’. Thus E(S, H) is the computation sequence 


© Heyden & Son Ltd, 1982 


Lzoz Auenuer g1 uo isen6 Aq pog6ge/6Z¢/¢/Gz/elolWe/|UlLUOo/WWO9" dno-olwepese//:sdy1y WOl) Pepeoj|uMOGg 


UNRAVELLING UNSTRUCTURED PROGRAMS 


set of the schema. For brevity E(i, H) will be written 
E(i). By convention E(H) = (), the empty string. 


Structured schemas 


A structured regular expression (sre) is defined recursively 
as follows: 


(1) Functions and predicates are sre’s. 

(2) If x and y denote any two sre’s and p is a predicate 
then the following are also sre’s: (a) a sequence 
x.y (b) a decision (p.x+p.y) (c) a loop 
(x.p.y)*.x.p or equivalently x.(p.y.x)*.p. 


The familiar WHILE-DO construct is a special case of 
the loop in which x = (), whilst the REPEAT-UNTIL 
construct is obtained by setting y = () instead. For loops, 
x. pis a forward path with respect to the loop’s entry and 
exit nodes whilst p. y is a backward path. A loop here is 
what Dijkstra reportedly*> termed a n + 4 loop. 

It is to be noted that whilst the flowgraph for a loop 
contains only one instance each of x and y, the 
corresponding regular expressions given in 2(c) above 
each contain two occurrences of x. In fact the second of 
the two expressions can be written in programming terms 
as: 


x; WHILE p DO BEGIN y; x END; 


showing that it is always possible (but only at the expense 
of duplicating the function on the forward path) to 
express a loop in terms of the WHILE...DO construct. 
Throughout this paper the (n + 4) loop is taken as the 
terminal form for a structured loop since it contains both 
the WHILE...DO and REPEAT... UNTIL constructs 
as special cases and, as just seen, can always be converted 
to WHILE... DO format if so desired. 

A schema is structured if and only if its computation 
sequence set E(S) is a sre. 


3. THE BASIC FORMS OF 
UNSTRUCTUREDNESS 


There are six basic unstructured forms (buf’s) that can 
occur in a schema: jump into a decision—ID; jump out 
of a decision—OD; jump into the forward path of a 
loop—IL; jump out of a forward path of a loop—OL; 
jump into the backward path of a loop—IB; and jump 
out of the backward path of a loop—OB. These are 
depicted in Fig. 1. The last two, IB and OB, are additional 
to the forms considered by McCabe.?! Referring to Fig. 
1, there are three possible placings for the node £: on a 
path from the START node S to node A; on a path from 
node C to the HALT node H; on a path from S to H 
which does not include nodes A, B or C. Analysis of all 
possible placings of node E with respect to each of the six 
buf’s shows that unstructuredness always occurs in 
possibly overlapping combinations of the six unstruc- 
tured subgraphs depicted and named in Fig. 2, and that 
none of the basic forms can ever occur by itself. For 
example, the schema of Fig. 7 comprises one instance 
each of LD, DL and LL. McCabe?! and Williams’ 
independently derived the forms described here as DD, 
DL, LD and LL. Williams’ added a fifth form called 
parallel loops but, as he recognized, this form is 


© Heyden & Son Ltd, 1982 


ID oD iL 
(4) (4) (4) 
@—® | @-© ($ © 
©) © © 
oL OB 


$. 


DL LD 
Q Q 


(4) (A) 
@—E) | @—-© 
O O 


Figure 1. The six basic unstructured forms. 


DD 


Ce se 
OY-—E)-—_9) 
C} 


G) G) G) 
(4) O (4) 


Figure 2. The six forms of unstructuredness. 


expressible in terms of the other four under the restriction 
of a single HALT node. 

Examination of each of the six forms DD...LB of 
Fig. 2 reveals that they are constructed from pairs chosen 
from the basic forms ID, OD, IL and OL. Thus DD = 
ID + OD; DL = ID + IL; LD = OD + OL; LL = IL + 
OL; BL = ID + OL; LB = OD + IL. (For instance, for 


THE COMPUTER JOURNAL, VOL. 25, NO. 3,1982 381 


Lzoz Menuer g, uo senBb Aq pog6ge/6Z¢/¢/Gz/e|olwe/|ulLUO09/WWO9" dno-olwepese//:sdy1y WOl) pepeoj|uMOGg 


G. OULSNAM 


ID ID-0 











Figure 3. Jump into a decision and its structured forms. 


BL the ID component is obtained with node | equivalent 
to B, node 2 to C, node 3 to A and the immediate 
predecessor of node | to £. The OL component comprises 
the loop 2—-4-3-2 with node 2 equivalent to A, node 3 to 
C, node 4 to B and the immediate successor node of the 
BL construct to E.) From this it follows that it is sufficient 
to consider just ID, OD, IL and OL as the basic units of 
unstructuredness whose removal will result in a structured 
schema. In fact, since none of these can occur alone in a 
schema, it is sufficient to consider any three of them as 
the minimum set for removal. 


4. THE STRUCTURING TRANSFORMS 


Two schemas having identical functions and predicates 
are computationally equivalent if their computation 
sequence sets are described by the same regular set. The 
first step in the structuring process is therefore to recast 
the regular expressions describing the buf’s into sre 
formats. The strategy for transforming an unstructured 
schema into structured form is then as follows. (1) 
Identify a buf and replace it with a computationally 
equivalent but structured subgraph. (Since buf’s cannot 
occur alone, a second buf will also be removed.) (2) 
Repeat the process until a structured schema is obtained. 

In this Section the structured equivalents of the buf’s 
are derived, whilst a proof that the structuring procedure 
can always be applied and will always terminate is given 
in Section 5. 

Consider first ID, Fig. 3, for which it is required to 


IL-0 


q.e.a 








find an sre for E(4) + E(E). From Fig. 3 it is seen that 


E(A) = q.e.E(C) + 9.d. E(B) 
E(E) = b. E(B) 
E(B) = a.E(C) (1) 


Bainbridge’! has given three rules for solving end set 
equations to yield sre’s. Letting x, y denote sre’s and pa 
predicate these are: 


1. if E(@v) = x. E(u) or E(v) = x, then eliminate E(v) by 
substitution; 

2. if E(v) = p.x. E(u) + p.y. E(u) 

‘ then deduce E(v) = (p.x + p.y). E(u); 

3. if E(v) = p.x.E(v) + p.y. E(u) or 
E(v) = p.x.E(v) + P. y then deduce 
E(v) = (p.x)*.p.y. E(u) or E(v) = (p.x)*.p.y 
respectively. 


Bainbridge asserts that if application of these rules yields 
a sre then the sre is minimal with respect to a count of the 
number of occurrences of functions and predicates, but 
if a stage is reached where none of the rules can be 
applied then there is no sre solution. Applying these rules 
to the end set equations for ID to eliminate E(B) gives: 


E(A) = (¢.e + 9.d.a).E(C) 
E(E) = b.a.E(C) (2) 


which is in the required sre format. However, unlike 
Eqns (1), Eqns (2) contain one duplication of identifier 
‘a’. The flowgraph corresponding to Eqns (2) is shown in 
Fig. 3 as ID-0, the 0 denoting no duplication of the 
predicate ‘q’. 

For IL, Fig. 4, the end set equations are: 





Figure 4. Jump into a loop and its structured forms. 


382 THE COMPUTER JOURNAL, VOL. 25, NO. 3, 1982 


© Heyden & Son Ltd, 1982 


Lzoz Auenuer g1 uo isenBb Aq pog6ge/6Z¢/¢/Gz/e|olwe/|ulLUO9/WO9" dno-olwepese//:sdy1y WOl) Pepeoj|uMOGg 


UNRAVELLING UNSTRUCTURED PROGRAMS 


E(A) = a. E(B) 
E(E) = b. E(B) 
E(B) =c.E(C) 


E(C) = q.e. E(A) + @. 
Eliminating E(B) and E(C) gives 
E(A) = a.c.(g.e.E(A) +9) 
from which can be shown 


E(A) = a.(c.q.e.a)*.c.g 
= a.E(B’), say, 
and 
E(E) = 6. E(B’) 


to give IL-0, Fig. 4. (Actually IL-0 can be obtained 
directly by simply substituting for E(A) in the equation 
for E(C).) Again it has been necessary to duplicate just 
edge ‘a’ to achieve sre format. 

Now consider OD, Fig. 5. In this case a sre for E(A) is 
required since B—E is an outgoing edge from the decision. 
The end set equations are: 


E(A) = q.e.E(C) + 4.d. E(B) 
E(B) = p.b.E(E) + p.c.E(C) (3) 


Since an expression for E(A) is required in terms of E(C) 
and E(E) it is necessary to eliminate E(B), but none of 
the Bainbridge rules can be applied to achieve this. Thus 
OD cannot be structured by function replication alone, 
so predicate replication must be considered instead. 

Predicate replication is achieved by introducing 
auxiliary predicate variables, or flags, with identifiers 
distinct from those of the schema’s functions, predicates 
and variables. In order to preserve the schema’s compu- 
tations over its variables, the fiags are introduced in the 
following way. At a TEST statement node, the TEST 
predicate ‘p’ is computed as before, but its value is 
immediately assigned to a flag ‘P’ uniquely associated 
with the predicate. It is this ftag that is used, rather than 
the original predicate, as the discriminant in choosing 
the exit path from the TEST node. In programming 
terms, 


IF p THEN... ELSE... 
is replaced by ' 
P=p; IF PTHEN...ELSE... 





Figure 5. Jump out of a decision. 


© Heyden & Son Ltd, 1982 


Whereas in the original schema the value of a predicate 
is known only at its point of computation (since 
subsequent functions will in general change the values of 
the predicate’s arguments), the introduction of a corre- 
sponding flag preserves the predicate’s value until that 
value is recomputed. In the sequel the convention is 
adopted that schema predicates will be denoted by p, q, 
r,... and the corresponding uniquely associated flags by 
P,Q,R.... 

It will also prove convenient to allow direct assignment 
of truth values to flags. Again, such assignments do not 
affect the original schema’s computations. Bainbridge’s 
rules can now be extended to allow for the introduction 
of flags as follows, where ‘1’ denotes TRUE, ‘0’ denotes 
FALSE and ‘@’ denotes the non-existent or null string: 


4. if E(u) = p.a.E(v) + p.b. E(w) introduce 
E(u) = (P‘=p).(P.a+P.b).(P.E() + P.E(w)) 
5. from (P= 1).(P.x + P.y) deduce x and 
from (P =0).(P.x + P.y) deduce y 
6. if x, y do not contain assignments to P, then from 
P.x.P .ydeduce P. x.y and from P.. x. P . y deduce 


@ = 
7. from (P + P). E(u) deduce E(u) 
8. from @.x and x. @ deduce @. 
In each of rules 4-7, P and P can be interchanged. 
Returning to the end set Eqns (3) for OD, the term 
q.e.E(C) can be recast as 


q.e.(P'=0).(P.E(E) + P.E(C)) 
and E(B) as 
(P= p).(P.b+ P.c).(P.E(E) + P.E(C)) 
to yield the sre: 
E(A) = (q.e.(P =0) + 9.d.(P =p).(P.b + P.c)). 
(P.E(E) + P.E(C)) 


The corresponding flowgraph is shown in Fig. 5 as 
OD-1, the ‘1’ indicating that the TEST statement at node 
Bhas been duplicated at B’. (In the Figure the assignment 
P = p has been ‘pushed back’ onto the A-B path for the 
sake of giving a slightly simpler flowgraph.) 

Now consider OL, Fig. 6. For this buf 


E(A) = a.p.b.E(E) +a.p.c.(q.e.E(A) + 9). 


OL-1 





Figure 6. Jump out of a loop. 


THE COMPUTER JOURNAL, VOL. 25, NO. 3,1982 383 





LZoz Menuer g, uo senBb Aq pog6ge/6Z¢/¢/Gz/e|olWe/|UlLUOD/WWO9" dno-olwepese//:sdy}y WOl) Pepeoj|uMOGg 


G. OULSNAM 


Én? 


Introducing flags for ‘p’ and ‘g’, and after some 
rearrangement of terms, then 


E(A) = a. (P =p). (P.c.(Q =q). Q.e. E(A) + 
P.b.E(E) + P.c.(Q=q).0). 


By expansion, the three disjunctive terms in parentheses 
can be written respectively as: 
(P.b.(Q=0) + P.c.(Q=@q)).Q.e.E(A) 


P.b.(Q=0).0.P.E(E) and 


P.c.(Q=q).0.P 
Using the expansion formula 
P.u.P.x+P.v.P.y=(P.ut+P.v).(P.x+P.y) 


on the last two expressions, and then collecting terms, 
gives the sre 


E(A) =a. (P =p). (P .b.(Q =0) + P.c.(Q=4@)). 
(Q.e.E(A) + O.(P.E(E) + P)) 


depicted as OL-1 in Fig. 6, where the original decision 
node B has been duplicated at B’. 

Whilst OD-1 and OL-1 each contain one duplicated 
decision node, neither contains the duplication of a 
function. It is also possible to structure both ID and IL 
without function duplication at the expense of one 
duplicated decision node as shown by ID-1 and IL-1 in 
Figs 3 and 4 respectively. 

Thus each of the basic unstructured forms OD, ID, IL 
and OL can be structured at the expense of at most one 
duplicated decision node and no function duplication, 
but only ID and IL can be structured by function 
duplication alone. 

As noted in Section 3, the six paradigms of unstruc- 
turedness are composed of pairs of basic unstructured 
forms: 


DD=ID+0OD; DL=ID+IL; LD= OD+OL; 
LL=IL+OL; BL= ID+OL; LB=OD+IL 


and all of these except LD can be structured without 
decision node duplication by a suitable application of 
either ID-0 or IL-0. However, as LD consists of OD + 
OL it can only be structured at the expense of one 
introduced decision node using either OD-1 or OL-1. 
Since the former requires only one flag to the latter’s two, 
OD-1 is the preferred choice. 

It remains to be established under what conditions, if 
any, the transforms can be applied in the presence of 
overlapping buf’s. This is taken up in the next Section. 


5. EFFECTIVENESS OF THE TRANSFORMS 


A transform is considered to be effective only if it results 
in another valid schema and if it gives a reduction in the 
total number of buf’s left in the schema. 

In the previous Section it was assumed in the derivation 
of the structuring transforms that there was no overlap 
between the buf’s. The effect of overlap is to introduce 
decision or collecting nodes on what would otherwise be 
edges of buf’s, and there is then no guarantee that the 
structuring transforms can still be applied effectively. In 
this Section it is shown that whilst some forms of overlap 
can invalidate certain transforms, nonetheless for every 
schema there is always at least one transform that can be 


384 THE COMPUTER JOURNAL, VOL. 25, NO. 3, 1982 


applied effectively, thus proving that every schema can 
be progressively transformed into structured format. 

Consider ID, Fig. 3, and transform ID-1. The 
introduction of a collecting node on any of the edges 
A-B, A-C or B-C of ID gives rise to one additional 
instance of ID. Application of ID-1 leaves the introduced 
ID intact, but still effectively removes the original ID. In 
fact itcan be seen from Fig. 3 that ID-1 remains effective 
in the presence of any number of collecting nodes on the 
edges of ID and also for any number of introduced 
decision nodes. 

Similarly, ID-0 is effective with respect to introduced 
nodes on edges A-B and A-C of ID, but, because of edge 
duplication, not for nodes on B-C. Consider an intro- 
duced collecting node B’ on B-C, and let B’ have an 
external immediate predecessor E£’. The subgraph AB’'CE’ 
can be regarded as an ID form with node B a collecting 
node on the A-C edge for which, as already seen, ID-0 is 
effective. This argument can be extended to any number 
of collecting nodes on edge B-C of the original ID. With 
regard to introduced decision nodes on edge B-C, the use 
of ID-0 is inappropriate as at best it would give rise to 
duplicated predicates—precisely what ID-0 was designed 
to avoid. However, as noted above, ID-1 can be used 
instead. Hence: 


Lemma 1. There is always an effective transform for the 
removal of ID constructs from any schema. 


Consider OD and OD-1, Fig. 5. The presence of 
collecting nodes on the edges A-B, A-C or B-C introduces 
instances of ID. From Lemma | it is known that these 
instances can always be removed effectively, so it remains 
to consider introduced decision nodes alone. It can be 
seen from Fig. 5 that OD-1 remains effective for decision 
nodes on A-B and A-C, but not for those on B-C. (Since 
E can always be chosen to make B-E an edge, there is no 
need to consider nodes on B-E.) Let B' be the decision 
node on B-C which has node C as an immediate 
successor, and let E’ be its other immediate successor. 
Subgraph AB’CE’ is now an instance of OD with decision 
nodes on its A-B” edge, for which OD-1 effective. Hence: 


Lemma 2. In the absence of ID constructs there is always 
an effective transform for the removal of OD constructs. 


Lemmas | and 2 together assert that it is always 
possible to transform an unstructured schema to one that 
contains no instances of ID or OD. Thus it is now 
necessary to consider schemas containing only IL and 
OL constructs. Since, as noted in Section 3, neither IL 
nor OL occur in combination with themselves, the only 
possible remaining constructs are of the form IL + OL, 
and the effective removal of one component guarantees 
the removal of the other. 

Consider IL and its transforms, Fig. 4. Any nodes 
introduced on the B~C or C—A edges leave both JL-0 and 
IL-1 as effective transforms. Collecting nodes on A-B 
introduce further instances of IL, and there is always one 
of these that has no collecting nodes on its corresponding 
A-B edge. Thus IL-0 and IL-1 can always be applied 
effectively to this IL construct in the absence of decision 
nodes on the A-B edge. 

It remains only to consider decision nodes on the A-B 
path. Let these nodes be B1,... , Bn with corresponding 
external target nodes Æl, ... , En. Since by assumption 


© Heyden & Son Ltd, 1982 


Lzoz Auenuer g} uo isenBb Aq pog6ge/6Z¢/¢/Gz/elolWe/|UlLUOD/WWO9" dno-olwepese//:sdy1y WOl) PepeojuMOGg 


UNRAVELLING UNSTRUCTURED PROGRAMS 


all instances of ID and OD have been removed, all the 
edges BI-El, ... , Bn-En are backward edges, and 
together give rise to other instances of IL + OL, that is, 
of LL. Since the schema is finite, there must be at least 
one LL that has no backward edge leading out of it. (The 
first instance of LL encountered on a forward path from 
START is an example.) But an LL construct which has 
no such edge cannot have an (introduced) decision node 
on any of its edges and, in particular, its IL component 
cannot have an introduced decision node on its A-B edge. 
As already noted, both IL-0 and IL-1 can be applied 
effectively to this instance of IL. 


Lemma 3. In the absence of ID and OD constructs, there 
is always an effective transform for the removal of IL 
constructs, and hence of OL constructs as well. 


Lemmas 1-3 taken together lead to the principal result 
of this paper. 


Theorem. It is always possible to put an unstructured 
schema into a computationally equivalent structured 
form using only the transforms ID-0, ID-1, OD-1, IL-0 
and IL-1. 


In fact, as is easily shown, the result can be strengthened 
to use ID-1, OD-1 and IL-1 as the minimum set if the 
avoidance of decision node duplication is not a 
requirement. 





6. AN EXAMPLE 


Consider the schema shown in Fig. 7. This is a slightly 
generalized version of a schema which Tausworthe”® 
calls Flynn’s Problem No. 5. To recover the original 
problem from Fig. 7 it is merely necessary to set ‘f’, ‘g’ 
and ‘h’ to (), the empty string. It can be seen that the 
schema comprises one instance each of LD (OL + OD), 
DL (ID + IL), and LL (IL + OL). Two structuring 
strategies might be: (1) avoid function duplication; (2) 
avoid decision node duplication (as far as possible). For 
the purposes of illustration the second approach will be 
chosen here. 

Since decision node duplication is to be avoided it is 
required to apply the Type-0 transforms of ID and IL 
wherever possible, but the presence of LD in the schema 
suggests that some decision node duplication is 
unavoidable. 

Consider first the ID construct comprising nodes B, C, 
E with external node F. The presence of decision node D 
on path C-E precludes the use of ID-0. Next consider the 
IL comprising nodes C, E, F with external node B. The 
decision node D on path C-E again prevents the use of a 
Type-0 transform—this time IL-0. Next consider the IL 
comprising nodes A, C, D with external node F. Now it 
is decision node B on path A-C that prevents the use of 
IL-0 and so there is no effective Type-O transform 
- available. 

Applying ID-1 removes the DL construct to leave a 
schema with one LL and one LD for which again no 
Type-0 transform is effective. Applying ID-1 and finally 


© Heyden & Son Ltd, 1982 





` Figure 7. Generalized Flynn's Problem No. 5. 


Figure 8. Structured form of Flynn's Problem. 


THE COMPUTER JOURNAL, VOL. 25, NO. 3, 1982 





385 


Lzoz Menuer g, uo senBb Aq pog6ge/6Z¢/¢/Gz/e|olwe/|ulLUO09/WWO9" dno-olwepese//:sdy1y WOl) pepeoj|uMOGg 


G. OULSNAM 


IL-1 gives the structured schema shown in Fig. 8. The 
end set equations for this schema are: 


E(S) = (Q = 1). E(D") 


E(D") = (Q.a .(P =p).(P.f + P.e) + Qh). 

(P.b.(Q=q).(Q.d + O.g)+ P. (Q =0)). 

(Q. E(D") + Q.c.(r.(Q =0). E(D") +») 
from which it can be seen that the assignment Q = 0 is 
redundant in the expression Q.c.r.(Q = 0). E(D”) and 
so can be eliminated. Thus the structured schema 
contains one duplication of decision node ‘p’ and two of 
‘q’, together with four assignments to flags. Whether or 
not the structured schema is more perspicuous than the 
original is left to the reader’s judgement. 


7. SPACE-TIME OVERHEADS 


In developing and proving the effectiveness of the 
transforms, two important questions were left unasked: 
how are the basic unstructured forms identified in a 
general flowgraph, and how efficient are the resulting 
flowgraphs in respect of space and time? 

With regard to the first question, whilst decision and 
loop subgraphs are easily identified in suitably drawn 
flowgraphs their presence is less obvious in arbitrary 
ones. The identification of such subgraphs is a major 
topic outside the scope of this paper. The interested 
reader is referred to standard texts such as Schaefer”? 
and Aho and Ullman’? where suitable techniques and 
further references can be found. 

On the question of efficiency, it is desirable to have 
some general measure for program schemas which is 
independent of particular interpretations. Such a measure 
could of course give no guidance in general on the 
efficiency of the transformations for particular interpre- 
tations of schemas, but could be useful as a means of 
comparing the results produced by the structuring 
process. One such measure is the space-time hierarchy 
for embedded graphs described by Lipton, Eisenstat and 
DeMillo (LED)*! and refined by them in Ref. 32. This 
measure can be defined informally as follows. Let G = 
(V, E) be a flowgraph in which the nodes V represent 
functions and predicates and the edges E the flow of 
control between them. (This definition is different from 
that used elsewhere in this paper—see Fig. 9 for the 
depiction of ID and ID-1 in this form.) Let dG(u, v) be 
the minimum path length, calculated as the number of 
edges, between two nodes u, v of V, with u # v. G is said 
to be embedded in a strictly equivalent flowgraph G* = 
(V*, E*) with respect to space S and time T if S is the 
largest number of duplications of any function or 
predicate of G contained in G*, and T is the least value 
satisfying dG*(u*, v*) < T.dG(u, v). Thus S$ = n means 
that there are m occurrences of some function in G* as 
against one in G and no other function in G* has more 
occurrences than n. T=m means that two distinct 
functions (or predicates) having one edge between them 
in G have m edges between them in G*, and no other 
pairs of distinct functions in G have a greater separation 
than m in G*. The embedding is denoted by G< 
(S, T)G*. 

Returning to the structuring transforms, and noting 
that each introduced reference (assignment or test) to a 
flag adds a node to the LED graph of a buf, it can be 


386 THE COMPUTER JOURNAL, VOL. 25, NO. 3, 1982 





Figure 9. Embedding of ID in !D-1. 


shown that ID < (2, 1) ID-0 and ID < (1, 3) ID-1, and 
similarly for IL. For OD the relationship is OD < (1, 3) 
OD-1 while for OL it is OL < (1,4) OL-1. Thus the 
Type-0 transforms require an increase of space alone 
(from function duplication), whereas the Type-1 trans- 
forms require an increase in time alone (from the 
introduction of flags). 

Because S and 7 are defined in terms of extreme values 
rather than total ones, they are not necessarily additive 
over successive applications of the transforms. Thus for 
the generalized Flynn’s Problem, denoting the original 
and final flowgraphs by G and G* respectively gives G < 
(1, 3) G* despite three applications of Type-1 transforms. 

Intuitive concepts of the relative complexities of the 
basic unstructured forms and their structured counter- 
parts are to. some extent reflected by < (S, T) but the 
important topological distinction between duplicated 
functions and duplicated decision nodes is lost. It might 
be worthwhile replacing S by (S, D) where S is unchanged 
in meaning and D is computed like S but in respect of 
decision nodes only. For this purpose a test on a flag in 
G* is regarded as the same action as a test on the 
corresponding predicate in G. Thus for the generalized 
Flynn’s Problem is now obtained G < ((1, 3), 3) G* indi- 
cating no duplicated functions, a maximum of three 
occurrences of at least one decision node (the TEST on 
Q) and an increased time factor (path length) of three for 
at least one path. This is the price to be paid for achieving 
structured form. 


CONCLUSION 


It has been shown that any unstructured schema can be 
put into strictly equivalent structured form using simple 
transforms on basic unstructured forms. Further, the 
transforms used do not rely on the introduction of 
compound predicate expressions and therefore preserve 
the original schema’s logic as closely as possible. 
However, it must be said that the structuring process 
presented here is no substitute for good design. The 
transforms developed in this paper might help to unravel 
some knotty problems, but they cannot produce logical 
poetry from tangled nonsense. 


© Heyden & Son Ltd, 1982 


Lzoz Menuer g, uo senBb Aq pog6ge/6Z¢/¢/Gz/e|olWe/|UlLUOD/WWO9" dno-olwepese//:sdy}y WOl) Pepeoj|uMOGg 


UNRAVELLING UNSTRUCTURED PROGRAMS 


REFERENCES 





1. 


2. 


10. 


12. 


13. 


15. 
16. 


J. Bruno and K. Steiglitz, The expression of algorithms by 
charts, Journal of the ACM 19 517-525 (Oct. 1972). 

C. Boehm and G. Jacopini, Flow diagrams, Turing machines, 
and languages with only two formation rules, Communications 
of the ACM 9, 366-371 (Sep. 1966). 


. E. Ashcroft and Z. Manna, The translation of ‘GOTO’ programs 


to ‘WHILE’ programs, in /nformation Processing 71, ed. by 
C. V. Freiman, Vol. 1, pp. 250-255. Amsterdam, North-Holland 
(1972). 


. D. £. Knuth and R. W. Floyd, Notes on avoiding GOTO 


statements, /nformation Processing Letters 1, 23-31 (1971); 
corrections 1, 177 (1972). 


. H. D. Mills, Mathematical foundations for structured program- 


ming, Federal Systems Division, IBM Corp., Gaithersburg, MD, 
FSC 72-6012 (1972). 


. T. Kasai, Translatability of flowcharts into While programs, 


Journal of Computer and Systems Sciences 9, 177-195 (Qct. 
1974). 


. M.H. Williams, Generating structured flow diagrams: the nature 


of unstructuredness, The Computer Journal 20, 45-50 (Feb. 
1977); 20, 381-383 (Nov. 1977). 


. M.H. Williams and H. L. Ossher, Conversion of unstructured 


flow diagrams to structured form, The Computer Journal 21, 
161-167 (May 1978). 


. G. Oulsnam, Cyclomatic numbers do not measure complexity 


of unstructured programs, /nformation Processing Letters 9, 
207-211 (Dec. 1979). 

D. C. Cooper, Boehm and Jacopini's reduction of flow charts, 
Communications of the ACM 10, 263, 463 (1967). 


. E. S. Bainbridge, Minimal while programs, in Lecture Notes in 


Computer Science, ed. by A. Mazurkiewicz, Vol. 45, pp. 180- 
186. Springer-Verlag, Berlin (1976). 

W. W. Paterson, T. Kasami and N. Tokura, On the capabilities of 
While, Repeat and Exit statements, Communications of the 
ACM 16, 503-512 (Aug. 1973). 

S. R. Kosaraju, Analysis of structured programs, Journal of 
Computer and Systems Sciences 9, 232-255 (Dec. 1974). 


. W. A. Wulf, Programming without the GOTO, in /nformation 


Processing 71, Vol. 1, pp. 408-4 13, ed. by C. V. Freiman, North- 
Holland, Amsterdam (1972). 

H. Ledgard and M. Marcotty, A genealogy of control structures, 
Communications of the ACM 18, 629-639 (Nov. 1975). 

J. C. Cherniavsky, J. Keohane and P. 8. Henderson, A note 
concerning top down program development and restricted exit 
control structures, /nformation Processing Letters 9, 8-12 (Jul. 
1979). 


© Heyden & Son Ltd, 1982 


17. 
18. 
19. 


20. 
21. 
22. 


23. 
24. 


25. 


26. 


27. 


28. 
29. 


30. 
31. 


32. 


G. Urschler, Automatic structuring of programs, /BM Journal of 
Research and Development 19, 181-194 (Mar. 1975). 

B. S. Baker, An algorithm for structuring flowgraphs, Journal of 
the ACM 24, 98-120 (Jan. 1977). 

E. Engeler, Structure and meaning of elementary programs, in 
Lecture Notes in Mathematics, ed. by E. Engeler, Vol. 188, 
pp. 89-101. Springer-Verlag, Berlin (1971). 

£. Wegner, Tree-structured programs, Communications of the 
ACM 6, 704-705 (Nov. 1973). 

T. J. McCabe, A complexity measure, /EEE Transactions on 
Software Engineering SE-2, 308-320 (Dec. 1976). 

E. W. Dijkstra, Go To statement considered harmful, Commu- 
nications of the ACM 11, 147-148 [the second occurrence of 
these pages], 538, 541 (Mar. 1968). 

D. E. Knuth, Structured programming with go to statements, 
ACM Computing Surveys 6, 261-301 (Dec. 1974). 

M. H. van Emden, Programming with verification conditions, 
IEEE Transactions on Software Engineering SE-5, 148-159 
(Mar. 1979). 

D.C. Luckham, D. M. R. Park and M. S. Paterson, On formalized 
computer programs, Journal of Computer and Systems Sci- 
ences, 4, 220-249 (Aug. 1970). 

S.C. Kleene, Representation of events in nerve sets, in Automata 
Studies, ed. by C. E. Shannon and J. McCarthy, pp. 3-40. 
Princeton University Press, Princeton, New Jersey (1956). 

P. J. Denning, J. B. Dennis and J. E. Qualitz, Machines, 
Languages and Computation, Prentice-Hall, Englewood Cliffs, 
New Jersey (1978). 

R. C. Tausworthe, Standardized Development of Computer 
Software, Prentice-Hall, Englewood Cliffs, New Jersey (1977). 
M. Schaefer, A Mathematical Theory of Global Program 
Optimization, Prentice-Hall, Englewood Cliffs, New Jersey 
(1973). 

A. V. Aho and J. D. Ullman, Principles of Compiler Design, 
Addison-Wesley, Reading, Massachusetts (1977). 

R. J. Lipton, S. C. Eisenstat and R. A. DeMillo, Space and time 
hierarchies for classes of control structures and data structures, 
Journal of the ACM 23, 720-732 (Oct. 1976). 

R. A. DeMillo, S. C. Eisenstat and R. J. Lipton, Space-time 
tradeoffs in structured programming: an improved combinatorial 
embedding theorem, Journal of the ACM 27, 123-127 (Jan. 
1980). 


Received October 1981 
© Heyden & Son Ltd, 1982 


THE COMPUTER JOURNAL, VOL. 25, NO. 3,1982 387 


Lzoz Auenuer g1 uo isenBb Aq pog6ge/6Z¢/¢/Gz/elolWe/|UlLUOo/WW09" dno-olwepese//:sdy}y WOl) Pepeoj|uMOGg 


